Machine Learning 
Model Validation 


Predictive vs. Inferential 
Parametric vs non-Parametric 
Bias vs. Variance 


Outline 


e Inferential vs. Predictive Statistics 
e Parametric vs. Non-Parametric Models 


e Bias vs. Variance Tradeoff 











Inferential Statistics 


e Using data analysis to infer 
properties of an underlying 
distribution of probability 


e analysis infers properties of a 
population, by testing hypotheses 
and deriving estimates 





e emphasis on model accuracy COO0ICHPS.com 


Predictive Statistics 


e analyze current and historical data to 
make predictions about the future 


e techniques from data mining, 
predictive modelling, and machine 
learning 


e emphasis on model accuracy 





Prediction vs. Inference 


Prediction: Inference: 
Understand relationship 


Predicting Y from X between X and Y 









Estimate an association 
between and outcome 
variable and a predictor 
variable (while adjusting for 
confounders). 











Develop a "best" model 
(considering all predictors) 
to predict Y with high 
accuracy, low error. 











What do the relationships 
between the vanables 
mean? 


How can | accurately 


Answers That question predict new data points? 


What mortality levels does 
the model predict given a 
certain income and 
education level? 


Which has the biggest 
impact on mortality: 
income or education? 


Prediction vs. Inference 


Dimensionality | The number of input variables. 


Interpretability The ability to understand how Higher Lower 
input variables relate to and 
cause output results, 

Temporal The emphasis related to time. 

Orientation 





Level of Model Detail 


Model detail required 


| Descriptive | Exploratory 


Understand Predict Prescribe 


Goal: Enable domain experts to solve problems in seconds to hours that 
currently take hours to weeks for someone with expertise in statistics 

















Parametric vs. Non-Parametric 





(Z Test, T-Test, F-test, ANOVA) (Chi-square Test, U-Test, H-Test) 


Parametric vs. Non-Parametric 
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Parametric Model 


Nonparametric Model 


Parametric ML algorithms: 
e Linear Regression 
Linear Support Vector Machines 
Logistic Regression 
Naive Bayes 
Perceptron 


Nonparametric models: 
Decision Trees 

K-Nearest Neighbor 
Support Vector Machines 
Artificial Neural Networks 


Advantages of Parametric 


ADVANTAGES DISADVANTAGES 





Parametric algorithms are SIMPLE to But, assuming one functional form, makes 
understand, as they work on a limited them constrained to only that type of form i.e. 
number of decided parameters. limited flexibility & scope of work. 

They require less computational power, Limited complexity in work i.e. works better 
hence are FASTER. for simple & less amounts of data. 


Do NOT require perfect or large amount of | Assumed forms € methods may or MAY NOT 
data for training. match the underlying mapping function. 


Advantages of Non-Parametric 


Non-Parametric algorithms do not make But, require large amounts of training data 
any assumptions on functional forms & for estimating & constructing mapping 

work on trying to best fit the data. functions. 

Capable of fitting € learning a large They have many parameters to train & work 
number of functional forms (since no on. Hence, they are SLOWER in giving results 


constraints of just one type of form) 


They result in high performance models for | Large amounts of training data may result in 
prediction. OVERFITTING & no justifications available for 
certain predictions made. 














Defining Bias Error 


High bias Low bias, low variance 
Bias 
e how different is the expected value of the estimator 
from the value being estimated 





e measures how far off in general these models' 


redictions are from the correct value ees 
j underfitting Good balance 


e the inability of a ML models to capture the true j I 
relationship | High bias model Optimal model 














Defining Variance Error 


Va ri ance Low bias, low variance las 
e the error due to variance is taken as the variability of a | ` vN\ 
model prediction for a given data point. 





e the variance is how much the predictions for a given 


point vary between different realizations of the model overfitting 
Good balance 


e the difference in fits(good or terrible) between different 


data sets Optimal model j High variance model 

















Total Error Term 


High variance High bias Low bias, low variance 
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overfitting underfitting Good balance 


Em(x) = (EFO -FŒ + E | (fO - EF) | +02 


Err(x) = Bias? + Variance + Irreducible Error 


Total Error Term 


Low Variance High Variance 


Higher variance; : : 
/ Lower bias an 
> < 
H (O, O 


Mean squared error (MSE) combines the notions of bias and standard error. It is defined as 


Low Bias 





Lower variance; 


Higher bias —s 2 





High Bias 





MSE = E([H — 0]2) = (standard error)? + (bias)? 











Bias vs. Variance -Model Complexity 


Total Error 






Variance 





Optimum Model Complexity 


Error 


Model Complexity 


Bias vs. Variance -Over/Under-fitting 


Underfitting Overfitting 






Predictive 
Error 


Error on Test Data 


Error on Training Data 


Model Complexity 
Ideal Range 
for Model Complexity 


Blas vs. Variance -Degree of Freedom 


Bias-Variance-Tradeoff 
Cubic Smoothing Spline 





0 10 20 30 40 
Degrees of freedom 


« MSE = Squared Bias — — Variance 


Bias vs. Variance -Number of data points 





Number of data points 


Lowering Bias or Lowering Variance 







Start training 


. Train longer 
High . Train a more complex model 
training High Bi Obtain more features 
error ? ee) Eee Decrease regularization 
New model architecture 

























Obtain more data 
Decrease number of features 
Increase regularization 

New model architecture 


High 
cross- 
validation 
error ? 





High Variance 
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Bias vs. Variance -degree of polynomial 
High Bias High Variance 
problem problem 
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(training error) 


degree of polynomial d 


Signs of a 
High Bias ML 


Signs of — 
High Bias 

& 
High Variance 


Signs of a 
High Variance ML 
Model 





Types 
of 


Model 
Optimization 











=a 2-way holdout method 


(train/test split) 
Large dataset 





a Confidence interval via 
normal approximation 





Performance 
estimation 





= (Repeated) k-fold cross-validation 


without independent test set 
Small dataset _ 
a Leave-one-out cross-validation 


without independent test set 





"= Confidence interval via 
0.632(+) bootstrap 













"= 3-way holdout method 


ebenen) (train/validation/test split) 





















Model selection 
(hyperparameter optimization) 










and performance estimation » (Repeated) k-fold cross-validation 


with independent test set 
Small dataset 





a Leave-one-out cross-validation 
with independent test set 











a Multiple independent 
training sets + test sets 
(algorithm comparison, AC) 










Large dataset 





= McNemar test 

(model comparison, MC) 
Model & algorithm 

comparison "= Cochran’s Q + McNemar test 
(MC) 








Small dataset 





= Combined 5x2cv F test (AC) 


= Nested cross-validation (AC) 

















High Bias vs. High Variance 


wah a "w 
Bias vs. Variante 


Mode} 


e train data 
° test dato. 






x 
- Flexible, model - 
(++ parameters? > 


- underfitted to data | ° OVerfitted to data 
> model does not capture | Model is Sensitive td 





dato. pattern/trend noise ‚does not generalize 
` High error on both weil to new unseen dato. 


train & test. dato * Low error on traindata 
ai I High error on test dato. 








Parametric 
VS. 








| un 


Non-Parametric ao 


Choosing parametric test Choosing a non- 
parametric Lest 


rn ee L= 
groups lest 





Kruskal-Wallis test 


Repeated RR ‚2 Maiched-pair t-test Wikoxon test 
conditions 
Repeated measures, >2 One-way, repeated I um! 





